Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 710 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 128.8 KiB |
| Average record size in memory | 185.8 B |
Variable types
| DateTime | 1 |
|---|---|
| Numeric | 7 |
| Categorical | 2 |
HML is highly correlated with Mkt_RF and 3 other fields | High correlation |
CMA is highly correlated with Mkt_RF and 2 other fields | High correlation |
Mkt_RF is highly correlated with SMB and 4 other fields | High correlation |
SMB is highly correlated with Mkt_RF and 3 other fields | High correlation |
RMW is highly correlated with SMB and 3 other fields | High correlation |
MOM is highly correlated with SMB and 1 other fields | High correlation |
Best is highly correlated with Mkt_RF and 1 other fields | High correlation |
Worst is highly correlated with Mkt_RF and 1 other fields | High correlation |
Date has unique values | Unique |
RF has 69 (9.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-10-11 15:44:41.134132 |
|---|---|
| Analysis finished | 2022-10-11 15:44:43.972958 |
| Duration | 2.84 seconds |
| Software version | pandas-profiling v3.3.0 |
| Download configuration | config.json |
| Distinct | 710 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.7 KiB |
| Minimum | 1963-07-01 00:00:00 |
|---|---|
| Maximum | 2022-08-01 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 566 |
|---|---|
| Distinct (%) | 79.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.005572535211 |
| Minimum | -0.2324 |
|---|---|
| Maximum | 0.161 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 285 |
| Negative (%) | 40.1% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.2324 |
|---|---|
| 5-th percentile | -0.072375 |
| Q1 | -0.019675 |
| median | 0.00915 |
| Q3 | 0.034 |
| 95-th percentile | 0.070895 |
| Maximum | 0.161 |
| Range | 0.3934 |
| Interquartile range (IQR) | 0.053675 |
Descriptive statistics
| Standard deviation | 0.04477172841 |
|---|---|
| Coefficient of variation (CV) | 8.034355408 |
| Kurtosis | 1.83251363 |
| Mean | 0.005572535211 |
| Median Absolute Deviation (MAD) | 0.02695 |
| Skewness | -0.5032989172 |
| Sum | 3.9565 |
| Variance | 0.002004507665 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.0138 | 3 | 0.4% |
| -0.0144 | 3 | 0.4% |
| 0.0103 | 3 | 0.4% |
| 0.014 | 3 | 0.4% |
| 0.0078 | 3 | 0.4% |
| -0.0229 | 3 | 0.4% |
| 0.0693 | 3 | 0.4% |
| 0.0143 | 3 | 0.4% |
| 0.0311 | 3 | 0.4% |
| 0.0206 | 3 | 0.4% |
| Other values (556) | 680 |
| Value | Count | Frequency (%) |
| -0.2324 | 1 | |
| -0.1723 | 1 | |
| -0.1608 | 1 | |
| -0.1339 | 1 | |
| -0.129 | 1 | |
| -0.1275 | 1 | |
| -0.1191 | 1 | |
| -0.1177 | 1 | |
| -0.11 | 1 | |
| -0.1072 | 1 |
| Value | Count | Frequency (%) |
| 0.161 | 1 | |
| 0.1366 | 1 | |
| 0.1365 | 1 | |
| 0.1247 | 2 | |
| 0.1216 | 1 | |
| 0.1135 | 1 | |
| 0.113 | 1 | |
| 0.1114 | 1 | |
| 0.1084 | 1 | |
| 0.1028 | 1 |
| Distinct | 510 |
|---|---|
| Distinct (%) | 71.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.00227056338 |
| Minimum | -0.1535 |
|---|---|
| Maximum | 0.1834 |
| Zeros | 2 |
| Zeros (%) | 0.3% |
| Negative | 340 |
| Negative (%) | 47.9% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.1535 |
|---|---|
| 5-th percentile | -0.042955 |
| Q1 | -0.015175 |
| median | 0.001 |
| Q3 | 0.02035 |
| 95-th percentile | 0.04914 |
| Maximum | 0.1834 |
| Range | 0.3369 |
| Interquartile range (IQR) | 0.035525 |
Descriptive statistics
| Standard deviation | 0.03024648984 |
|---|---|
| Coefficient of variation (CV) | 13.32113876 |
| Kurtosis | 3.135512646 |
| Mean | 0.00227056338 |
| Median Absolute Deviation (MAD) | 0.018 |
| Skewness | 0.3422554032 |
| Sum | 1.6121 |
| Variance | 0.0009148501478 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.0013 | 5 | 0.7% |
| 0.0271 | 4 | 0.6% |
| -0.0139 | 4 | 0.6% |
| -0.0114 | 4 | 0.6% |
| 0.0031 | 4 | 0.6% |
| -0.0062 | 4 | 0.6% |
| -0.0041 | 3 | 0.4% |
| 0.0192 | 3 | 0.4% |
| -0.0107 | 3 | 0.4% |
| -0.0005 | 3 | 0.4% |
| Other values (500) | 673 |
| Value | Count | Frequency (%) |
| -0.1535 | 1 | |
| -0.1002 | 1 | |
| -0.0831 | 1 | |
| -0.0807 | 1 | |
| -0.0728 | 1 | |
| -0.0693 | 1 | |
| -0.0691 | 1 | |
| -0.0682 | 1 | |
| -0.0645 | 1 | |
| -0.0643 | 1 |
| Value | Count | Frequency (%) |
| 0.1834 | 1 | |
| 0.1291 | 1 | |
| 0.1041 | 1 | |
| 0.0993 | 1 | |
| 0.0918 | 1 | |
| 0.091 | 1 | |
| 0.0851 | 1 | |
| 0.0799 | 1 | |
| 0.0761 | 1 | |
| 0.0754 | 1 |
| Distinct | 498 |
|---|---|
| Distinct (%) | 70.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.00298084507 |
| Minimum | -0.1397 |
|---|---|
| Maximum | 0.1275 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 326 |
| Negative (%) | 45.9% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.1397 |
|---|---|
| 5-th percentile | -0.041 |
| Q1 | -0.013875 |
| median | 0.00245 |
| Q3 | 0.0175 |
| 95-th percentile | 0.05401 |
| Maximum | 0.1275 |
| Range | 0.2672 |
| Interquartile range (IQR) | 0.031375 |
Descriptive statistics
| Standard deviation | 0.02966002661 |
|---|---|
| Coefficient of variation (CV) | 9.950207377 |
| Kurtosis | 2.379070988 |
| Mean | 0.00298084507 |
| Median Absolute Deviation (MAD) | 0.0158 |
| Skewness | 0.1068899396 |
| Sum | 2.1164 |
| Variance | 0.0008797171784 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.0085 | 7 | 1.0% |
| 0.0117 | 4 | 0.6% |
| 0.0015 | 4 | 0.6% |
| 0.0175 | 4 | 0.6% |
| -0.0002 | 4 | 0.6% |
| -0.0013 | 4 | 0.6% |
| 0.0043 | 4 | 0.6% |
| 0.0119 | 4 | 0.6% |
| 0.0227 | 4 | 0.6% |
| -0.0276 | 3 | 0.4% |
| Other values (488) | 668 |
| Value | Count | Frequency (%) |
| -0.1397 | 1 | |
| -0.1129 | 1 | |
| -0.0987 | 1 | |
| -0.097 | 1 | |
| -0.0843 | 1 | |
| -0.0833 | 1 | |
| -0.0832 | 1 | |
| -0.0782 | 1 | |
| -0.0766 | 1 | |
| -0.0695 | 1 |
| Value | Count | Frequency (%) |
| 0.1275 | 1 | |
| 0.1248 | 1 | |
| 0.1232 | 1 | |
| 0.0863 | 1 | |
| 0.0841 | 1 | |
| 0.083 | 1 | |
| 0.0828 | 1 | |
| 0.0819 | 1 | |
| 0.0817 | 1 | |
| 0.0763 | 1 |
| Distinct | 446 |
|---|---|
| Distinct (%) | 62.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002724225352 |
| Minimum | -0.1873 |
|---|---|
| Maximum | 0.1309 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 309 |
| Negative (%) | 43.5% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.1873 |
|---|---|
| 5-th percentile | -0.027485 |
| Q1 | -0.007875 |
| median | 0.0024 |
| Q3 | 0.013075 |
| 95-th percentile | 0.03471 |
| Maximum | 0.1309 |
| Range | 0.3182 |
| Interquartile range (IQR) | 0.02095 |
Descriptive statistics
| Standard deviation | 0.02215376522 |
|---|---|
| Coefficient of variation (CV) | 8.132133858 |
| Kurtosis | 11.54272923 |
| Mean | 0.002724225352 |
| Median Absolute Deviation (MAD) | 0.0106 |
| Skewness | -0.2997310721 |
| Sum | 1.9342 |
| Variance | 0.0004907893136 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.003 | 7 | 1.0% |
| 0.013 | 5 | 0.7% |
| 0.0027 | 5 | 0.7% |
| 0.0131 | 5 | 0.7% |
| -0.0068 | 5 | 0.7% |
| 0.0204 | 4 | 0.6% |
| 0.0008 | 4 | 0.6% |
| -0.0137 | 4 | 0.6% |
| 0.0093 | 4 | 0.6% |
| -0.0042 | 4 | 0.6% |
| Other values (436) | 663 |
| Value | Count | Frequency (%) |
| -0.1873 | 1 | |
| -0.0921 | 1 | |
| -0.0832 | 1 | |
| -0.076 | 1 | |
| -0.0706 | 1 | |
| -0.0631 | 1 | |
| -0.048 | 1 | |
| -0.047 | 2 | |
| -0.0462 | 1 | |
| -0.0444 | 1 |
| Value | Count | Frequency (%) |
| 0.1309 | 1 | |
| 0.1182 | 1 | |
| 0.096 | 1 | |
| 0.0911 | 1 | |
| 0.0806 | 1 | |
| 0.0766 | 1 | |
| 0.0742 | 1 | |
| 0.0722 | 1 | |
| 0.0646 | 1 | |
| 0.0629 | 1 |
| Distinct | 443 |
|---|---|
| Distinct (%) | 62.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.002843380282 |
| Minimum | -0.0694 |
|---|---|
| Maximum | 0.0905 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 329 |
| Negative (%) | 46.3% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.0694 |
|---|---|
| 5-th percentile | -0.026555 |
| Q1 | -0.01 |
| median | 0.00095 |
| Q3 | 0.0149 |
| 95-th percentile | 0.03681 |
| Maximum | 0.0905 |
| Range | 0.1599 |
| Interquartile range (IQR) | 0.0249 |
Descriptive statistics
| Standard deviation | 0.02039952998 |
|---|---|
| Coefficient of variation (CV) | 7.174393842 |
| Kurtosis | 1.426598638 |
| Mean | 0.002843380282 |
| Median Absolute Deviation (MAD) | 0.01255 |
| Skewness | 0.3021728958 |
| Sum | 2.0188 |
| Variance | 0.0004161408235 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.0034 | 6 | 0.8% |
| 0.0009 | 5 | 0.7% |
| 0.009 | 5 | 0.7% |
| -0.0004 | 5 | 0.7% |
| 0.0084 | 5 | 0.7% |
| -0.012 | 4 | 0.6% |
| 0.0046 | 4 | 0.6% |
| -0.016 | 4 | 0.6% |
| -0.0033 | 4 | 0.6% |
| -0.0095 | 4 | 0.6% |
| Other values (433) | 664 |
| Value | Count | Frequency (%) |
| -0.0694 | 1 | |
| -0.0677 | 1 | |
| -0.0662 | 1 | |
| -0.0583 | 1 | |
| -0.0566 | 1 | |
| -0.0563 | 1 | |
| -0.05 | 1 | |
| -0.0474 | 1 | |
| -0.047 | 1 | |
| -0.0454 | 1 |
| Value | Count | Frequency (%) |
| 0.0905 | 1 | |
| 0.0839 | 1 | |
| 0.0771 | 1 | |
| 0.0656 | 1 | |
| 0.0646 | 1 | |
| 0.0621 | 1 | |
| 0.0592 | 1 | |
| 0.0591 | 1 | |
| 0.0589 | 1 | |
| 0.0565 | 1 |
| Distinct | 539 |
|---|---|
| Distinct (%) | 75.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.006294225352 |
| Minimum | -0.343 |
|---|---|
| Maximum | 0.182 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 264 |
| Negative (%) | 37.2% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | -0.343 |
|---|---|
| 5-th percentile | -0.06532 |
| Q1 | -0.009525 |
| median | 0.00735 |
| Q3 | 0.028975 |
| 95-th percentile | 0.064205 |
| Maximum | 0.182 |
| Range | 0.525 |
| Interquartile range (IQR) | 0.0385 |
Descriptive statistics
| Standard deviation | 0.0419055474 |
|---|---|
| Coefficient of variation (CV) | 6.657776781 |
| Kurtosis | 9.952008228 |
| Mean | 0.006294225352 |
| Median Absolute Deviation (MAD) | 0.0192 |
| Skewness | -1.283579652 |
| Sum | 4.4689 |
| Variance | 0.001756074903 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.0316 | 5 | 0.7% |
| 0.0022 | 5 | 0.7% |
| 0.0004 | 4 | 0.6% |
| 0.0086 | 4 | 0.6% |
| -0.0058 | 4 | 0.6% |
| 0.009 | 3 | 0.4% |
| 0.0252 | 3 | 0.4% |
| 0.0445 | 3 | 0.4% |
| 0.0303 | 3 | 0.4% |
| -0.0184 | 3 | 0.4% |
| Other values (529) | 673 |
| Value | Count | Frequency (%) |
| -0.343 | 1 | |
| -0.253 | 1 | |
| -0.1633 | 1 | |
| -0.1382 | 1 | |
| -0.1249 | 1 | |
| -0.1243 | 1 | |
| -0.1187 | 1 | |
| -0.1157 | 1 | |
| -0.107 | 1 | |
| -0.0955 | 1 |
| Value | Count | Frequency (%) |
| 0.182 | 1 | |
| 0.166 | 1 | |
| 0.1522 | 1 | |
| 0.1322 | 1 | |
| 0.1275 | 1 | |
| 0.1257 | 1 | |
| 0.1148 | 1 | |
| 0.1038 | 1 | |
| 0.0998 | 1 | |
| 0.0964 | 1 |
| Distinct | 106 |
|---|---|
| Distinct (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.003626338028 |
| Minimum | 0 |
|---|---|
| Maximum | 0.0135 |
| Zeros | 69 |
| Zeros (%) | 9.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 5.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.0014 |
| median | 0.0038 |
| Q3 | 0.0051 |
| 95-th percentile | 0.0081 |
| Maximum | 0.0135 |
| Range | 0.0135 |
| Interquartile range (IQR) | 0.0037 |
Descriptive statistics
| Standard deviation | 0.002682255718 |
|---|---|
| Coefficient of variation (CV) | 0.7396595953 |
| Kurtosis | 0.6327734726 |
| Mean | 0.003626338028 |
| Median Absolute Deviation (MAD) | 0.00175 |
| Skewness | 0.6596971813 |
| Sum | 2.5747 |
| Variance | 7.194495739 × 10-6 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 69 | 9.7% |
| 0.0001 | 44 | 6.2% |
| 0.0043 | 21 | 3.0% |
| 0.004 | 21 | 3.0% |
| 0.0042 | 18 | 2.5% |
| 0.0046 | 18 | 2.5% |
| 0.0039 | 16 | 2.3% |
| 0.0031 | 16 | 2.3% |
| 0.0044 | 16 | 2.3% |
| 0.0037 | 15 | 2.1% |
| Other values (96) | 456 |
| Value | Count | Frequency (%) |
| 0 | 69 | |
| 0.0001 | 44 | |
| 0.0002 | 8 | 1.1% |
| 0.0003 | 4 | 0.6% |
| 0.0004 | 2 | 0.3% |
| 0.0005 | 1 | 0.1% |
| 0.0006 | 5 | 0.7% |
| 0.0007 | 6 | 0.8% |
| 0.0008 | 7 | 1.0% |
| 0.0009 | 7 | 1.0% |
| Value | Count | Frequency (%) |
| 0.0135 | 1 | 0.1% |
| 0.0131 | 1 | 0.1% |
| 0.0128 | 1 | 0.1% |
| 0.0126 | 1 | 0.1% |
| 0.0124 | 2 | |
| 0.0121 | 3 | |
| 0.0115 | 1 | 0.1% |
| 0.0113 | 1 | 0.1% |
| 0.0108 | 1 | 0.1% |
| 0.0107 | 2 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 42.4 KiB |
| Mkt_RF | |
|---|---|
| MOM | |
| HML | |
| SMB | |
| RMW | |
| Other values (2) |
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 3.902816901 |
| Min length | 2 |
Characters and Unicode
| Total characters | 2771 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | MOM |
|---|---|
| 2nd row | Mkt_RF |
| 3rd row | CMA |
| 4th row | MOM |
| 5th row | CMA |
Common Values
| Value | Count | Frequency (%) |
| Mkt_RF | 215 | |
| MOM | 165 | |
| HML | 104 | |
| SMB | 96 | |
| RMW | 70 | 9.9% |
| CMA | 56 | 7.9% |
| RF | 4 | 0.6% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| mkt_rf | 215 | |
| mom | 165 | |
| hml | 104 | |
| smb | 96 | |
| rmw | 70 | 9.9% |
| cma | 56 | 7.9% |
| rf | 4 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 871 | |
| R | 289 | 10.4% |
| F | 219 | 7.9% |
| k | 215 | 7.8% |
| t | 215 | 7.8% |
| _ | 215 | 7.8% |
| O | 165 | 6.0% |
| H | 104 | 3.8% |
| L | 104 | 3.8% |
| S | 96 | 3.5% |
| Other values (4) | 278 | 10.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2126 | |
| Lowercase Letter | 430 | 15.5% |
| Connector Punctuation | 215 | 7.8% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 871 | |
| R | 289 | 13.6% |
| F | 219 | 10.3% |
| O | 165 | 7.8% |
| H | 104 | 4.9% |
| L | 104 | 4.9% |
| S | 96 | 4.5% |
| B | 96 | 4.5% |
| W | 70 | 3.3% |
| C | 56 | 2.6% |
Lowercase Letter
| Value | Count | Frequency (%) |
| k | 215 | |
| t | 215 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 215 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2556 | |
| Common | 215 | 7.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 871 | |
| R | 289 | 11.3% |
| F | 219 | 8.6% |
| k | 215 | 8.4% |
| t | 215 | 8.4% |
| O | 165 | 6.5% |
| H | 104 | 4.1% |
| L | 104 | 4.1% |
| S | 96 | 3.8% |
| B | 96 | 3.8% |
| Other values (3) | 182 | 7.1% |
Common
| Value | Count | Frequency (%) |
| _ | 215 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2771 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 871 | |
| R | 289 | 10.4% |
| F | 219 | 7.9% |
| k | 215 | 7.8% |
| t | 215 | 7.8% |
| _ | 215 | 7.8% |
| O | 165 | 6.0% |
| H | 104 | 3.8% |
| L | 104 | 3.8% |
| S | 96 | 3.5% |
| Other values (4) | 278 | 10.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 42.2 KiB |
| Mkt_RF | |
|---|---|
| SMB | |
| HML | |
| MOM | |
| RMW | |
| Other values (2) |
Length
| Max length | 6 |
|---|---|
| Median length | 3 |
| Mean length | 3.743661972 |
| Min length | 2 |
Characters and Unicode
| Total characters | 2658 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | CMA |
|---|---|
| 2nd row | SMB |
| 3rd row | Mkt_RF |
| 4th row | CMA |
| 5th row | SMB |
Common Values
| Value | Count | Frequency (%) |
| Mkt_RF | 178 | |
| SMB | 122 | |
| HML | 122 | |
| MOM | 116 | |
| RMW | 94 | |
| CMA | 72 | |
| RF | 6 | 0.8% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| mkt_rf | 178 | |
| smb | 122 | |
| hml | 122 | |
| mom | 116 | |
| rmw | 94 | |
| cma | 72 | |
| rf | 6 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 820 | |
| R | 278 | 10.5% |
| F | 184 | 6.9% |
| k | 178 | 6.7% |
| t | 178 | 6.7% |
| _ | 178 | 6.7% |
| S | 122 | 4.6% |
| B | 122 | 4.6% |
| H | 122 | 4.6% |
| L | 122 | 4.6% |
| Other values (4) | 354 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 2124 | |
| Lowercase Letter | 356 | 13.4% |
| Connector Punctuation | 178 | 6.7% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 820 | |
| R | 278 | 13.1% |
| F | 184 | 8.7% |
| S | 122 | 5.7% |
| B | 122 | 5.7% |
| H | 122 | 5.7% |
| L | 122 | 5.7% |
| O | 116 | 5.5% |
| W | 94 | 4.4% |
| C | 72 | 3.4% |
Lowercase Letter
| Value | Count | Frequency (%) |
| k | 178 | |
| t | 178 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 178 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2480 | |
| Common | 178 | 6.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 820 | |
| R | 278 | 11.2% |
| F | 184 | 7.4% |
| k | 178 | 7.2% |
| t | 178 | 7.2% |
| S | 122 | 4.9% |
| B | 122 | 4.9% |
| H | 122 | 4.9% |
| L | 122 | 4.9% |
| O | 116 | 4.7% |
| Other values (3) | 238 | 9.6% |
Common
| Value | Count | Frequency (%) |
| _ | 178 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2658 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 820 | |
| R | 278 | 10.5% |
| F | 184 | 6.9% |
| k | 178 | 6.7% |
| t | 178 | 6.7% |
| _ | 178 | 6.7% |
| S | 122 | 4.6% |
| B | 122 | 4.6% |
| H | 122 | 4.6% |
| L | 122 | 4.6% |
| Other values (4) | 354 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Date | Mkt_RF | SMB | HML | RMW | CMA | MOM | RF | Best | Worst | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1963-07-01 | -0.0039 | -0.0041 | -0.0097 | 0.0068 | -0.0118 | 0.0090 | 0.0027 | MOM | CMA |
| 1 | 1963-08-01 | 0.0507 | -0.0080 | 0.0180 | 0.0036 | -0.0035 | 0.0101 | 0.0025 | Mkt_RF | SMB |
| 2 | 1963-09-01 | -0.0157 | -0.0052 | 0.0013 | -0.0071 | 0.0029 | 0.0019 | 0.0027 | CMA | Mkt_RF |
| 3 | 1963-10-01 | 0.0253 | -0.0139 | -0.0010 | 0.0280 | -0.0201 | 0.0312 | 0.0029 | MOM | CMA |
| 4 | 1963-11-01 | -0.0085 | -0.0088 | 0.0175 | -0.0051 | 0.0224 | -0.0074 | 0.0027 | CMA | SMB |
| 5 | 1963-12-01 | 0.0183 | -0.0210 | -0.0002 | 0.0003 | -0.0007 | 0.0175 | 0.0029 | Mkt_RF | SMB |
| 6 | 1964-01-01 | 0.0224 | 0.0013 | 0.0148 | 0.0017 | 0.0147 | 0.0086 | 0.0030 | Mkt_RF | SMB |
| 7 | 1964-02-01 | 0.0154 | 0.0028 | 0.0281 | -0.0005 | 0.0091 | 0.0026 | 0.0026 | HML | RMW |
| 8 | 1964-03-01 | 0.0141 | 0.0123 | 0.0340 | -0.0221 | 0.0322 | 0.0075 | 0.0031 | HML | RMW |
| 9 | 1964-04-01 | 0.0010 | -0.0152 | -0.0067 | -0.0127 | -0.0108 | -0.0058 | 0.0029 | RF | SMB |
Last rows
| Date | Mkt_RF | SMB | HML | RMW | CMA | MOM | RF | Best | Worst | |
|---|---|---|---|---|---|---|---|---|---|---|
| 700 | 2021-11-01 | -0.0155 | -0.0176 | -0.0044 | 0.0722 | 0.0174 | 0.0090 | 0.0000 | RMW | SMB |
| 701 | 2021-12-01 | 0.0310 | -0.0077 | 0.0328 | 0.0492 | 0.0443 | -0.0260 | 0.0001 | RMW | MOM |
| 702 | 2022-01-01 | -0.0625 | -0.0405 | 0.1275 | 0.0087 | 0.0771 | -0.0259 | 0.0000 | HML | Mkt_RF |
| 703 | 2022-02-01 | -0.0229 | 0.0296 | 0.0304 | -0.0208 | 0.0313 | 0.0176 | 0.0000 | CMA | Mkt_RF |
| 704 | 2022-03-01 | 0.0305 | -0.0215 | -0.0180 | -0.0156 | 0.0317 | 0.0300 | 0.0001 | CMA | SMB |
| 705 | 2022-04-01 | -0.0946 | -0.0040 | 0.0619 | 0.0363 | 0.0592 | 0.0489 | 0.0001 | HML | Mkt_RF |
| 706 | 2022-05-01 | -0.0034 | -0.0006 | 0.0841 | 0.0144 | 0.0398 | 0.0248 | 0.0003 | HML | Mkt_RF |
| 707 | 2022-06-01 | -0.0843 | 0.0130 | -0.0597 | 0.0185 | -0.0470 | 0.0079 | 0.0006 | RMW | Mkt_RF |
| 708 | 2022-07-01 | 0.0957 | 0.0187 | -0.0410 | 0.0068 | -0.0694 | -0.0396 | 0.0008 | Mkt_RF | CMA |
| 709 | 2022-08-01 | -0.0378 | 0.0151 | 0.0031 | -0.0480 | 0.0131 | 0.0209 | 0.0019 | MOM | RMW |